Geo-Spatial Analysis on UFO sighting data

Experiment
Geo-Spatial
Data-Analysis
Author

F.L

Published

June 8, 2024

Introduction

I first encounter UFO for my master degree. Not literally. I need a project for my “Data Viz” module. But back then my geo-spatial analysis skills was limited.

This little project will look over these data an attempt to answer questions such as:

  • Where can I find aliens?
  • What does a UFO typically looks like (based on description)?
  • Has alien left us since the 1980s? (temporal patterns)

UFO Heatmap 1910-2014

The Data

Data includes both logitude location and times. So can do both time series or spatial viz.  Source: https://www.kaggle.com/NUFORC/ufo-sightings

The complete data includes entries where the location of the sighting was not found or blank (0.8146%) or have an erroneous or blank time (8.0237%). Since the reports date back to the 20th century, some older data might be obscured. Data contains city, state, time, description, and duration of each sighting.

library(tidyverse)
library(sf)
library(leaflet)
## read data and clean date
ufo <- read_csv('./scrubbed.csv') |> 
  janitor::clean_names() |> 
  mutate(
     datetime = as_datetime(datetime, format = "%m/%e/%Y %R")
    ,date_posted = as_date(date_posted, format="%m/%e/%Y")) |> 
  mutate(id = row_number()) |> 
  relocate(id)

## convert to sf object
ufo_points = ufo |> 
  filter(!is.na(latitude) & !is.na(longitude)) |> 
  filter(!(latitude == 0 & longitude == 0)) |> 
  drop_na() |> 
  sf::st_as_sf(coords=c("longitude","latitude"),crs=4326)

# df_scr <- read_csv('./scrubbed.csv') |> janitor::clean_names()

## Below used to analysis how to format date
# ## analysis to check which one is digit which one is date, which one is month
# df_com |> 
#   select(datetime) |> 
#   mutate(
#     md1 = str_extract(datetime, "^\\d{1,2}")
#     ,md2 = str_extract(datetime, "(?<=[\\d]{2}\\/)\\d{1,2}")
#   ) |> 
#   distinct(md1,md2)
# ## first digit is month, second is date
# stopifnot(0!=df_com |> 
#   select(datetime) |> 
#   mutate(datetime_ = as_datetime(datetime, format="%m/%e/%Y %R")) |> 
#   filter(day(datetime_) > 12) |> 
#   nrow())
# ## the format you want to extract is "%m/%e/%Y %R" use lubridate dattiem

Grand Map View

Only a proportion of data are kept.

[1] "The propotion of data visualised is: 0.83"
[1] "1910-01-02 UTC" "2014-05-08 UTC"

This is all the data we can find

create UFO icon map - make icon
## over all scale using 
ufo_icon = makeIcon(iconUrl = "./asset/ufo.png",
                    iconWidth = 25, iconHeight = 25)

## a javascript function which creates clusters
clus_func_js = function() {
  JS("  
    function(cluster) {  
       return new L.DivIcon({  
         html: '<div style=\"background-color:rgba(77,77,77,0.05)\"><span>' + cluster.getChildCount() + '</div><span>',  
         className: 'marker-cluster'  
       });  
       }")
}

addUFOMarker = function(x) {
  addMarkers(
    x,
    # ~longitude, ~latitude,
    icon = ufo_icon,  
    clusterOptions = markerClusterOptions(  
      iconCreateFunction = clus_func_js()
    ),  
    label = ~ label,
    popup = ~comments_,
    group = "ufo"
  )
}

## Make UFO Icon ---------------------------------------------------------------
## access polygon data
country_polygons = spData::world |> 
  filter(iso_a2 %in% c("GB","US","CA"))

## make grid on top of the layer
mesh=sf::st_make_grid(
     country_polygons
    ,cellsize = c(0.5,0.5)
    ,square = T
  )
## intersect points 
ids = st_intersects(mesh,ufo_points)
which_non_zeros = which(lengths(ids) != 0)
pixels =st_as_sf(x=mesh[which_non_zeros])
pixels$n = lengths(ids)[which_non_zeros]

## Make the Map ----------------------------------------------------------------

## leaflet color palete
pal <- colorNumeric(
  palette = "PuBu",
  domain = sqrt(pixels$n))

## use base layer of map
## set view to USA
us_lon <- -95.71 #W
us_lat <- 37.09 #N
base_map = ufo_points |> 
  mutate(
     label = paste(city,state,country,shape)
    ,comments_ = paste0(datetime,": ", comments)
  ) |> 
  leaflet()  |> 
  setView(lng = us_lon, lat = us_lat, zoom =4) |> 
  addProviderTiles(providers$CartoDB.Positron)

## map
base_map |> 
  addPolygons(
    data=pixels,stroke=F,fillColor = ~pal(sqrt(pixels$n))
    , fillOpacity = 0.8
    , group = "heatmap") |> 
  addUFOMarker() |> ## leaflet color palete
  addLayersControl(
    overlayGroups=c("ufo","heatmap")
  )

If you explore this map, you will find there are barely any record outside North America The majority of sighting concentrate on US. Actually if you identify the rest of the data. Majority comes from a country that is not US…

ufo |> 
  left_join(select(ufo_points,id),"id",keep=T,suffix=c("","_p")) |>
  filter(is.na(id_p)) |> 
  slice_sample(n=10)
ufo |> 
  left_join(select(ufo_points,id),"id",keep=T,suffix=c("","_p")) |>
  filter(is.na(id_p)) |> 
  group_by(country) |> 
  # mutate(country=coalesce(country,paste("-",state))) |> 
  count() |> 
  arrange(desc(n))

We are missing 1894 Great Britain sighting.

In some cases the “UFOs” concentrate at one point:

(a) Concentrated Sightings at NYC
(b) Civic Centre in 1964
Figure 1: About 500 sighting squished into one spot, nearest location is NY civic center.

It is likely that those data does not have specific coordinate so they are squished together into one marker.

Sighting Over Time

Next lets overview if there are any temporary pattern in terms of number of UFO sightings.

animation map
library(plotly)
library(RColorBrewer)
mapbox_token = Sys.getenv("MAPBOX_TOKEN")

xy=sf::st_coordinates(ufo_points)

ufo_points$lon = xy[,1]
ufo_points$lat = xy[,2]
ufo_points$year = year(ufo_points$datetime)
ufo_points |> 
  plot_ly(
    type='densitymapbox'
    ,frame = ~year
  ) |> 
  add_trace(
     radius = 5
    ,lon = ~lon
    ,lat = ~lat
    # ,coloraxis = 'coloraxis'
    ,colorscale="Electric"
    ,showlegend=F
  ) |> 
  config(mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN")) |> 
  layout(
    mapbox = list(
      # style="carto-positron",
      style="dark",
      zoom=2.5,
      center= list(lon=us_lon,lat=us_lat))
    # , coloraxis = list(colorscale="Viridis")
    ) |> 
  animation_opts(
    150,
    easing = "sin", #redraw = FALSE
  )
Code
ufo |> 
  left_join(select(ufo_points,id),"id",keep=T,suffix=c("","_p")) |>
  mutate(
    month_year=floor_date(datetime,"months"),
    miss_coord=is.na(id_p)) |> 
  group_by(month_year,miss_coord) |> 
  summarise(n=n(),.groups="drop") |> 
  ggplot(aes(x=month_year,y=n,color=miss_coord)) +
  geom_line() +
  theme_minimal() +
  ylab("number of occurance") +
  xlab("month and year") + 
  geom_vline(xintercept=as.POSIXct(ymd("1990-01-01")),linetype="dashed",color="midnightblue") +
  annotate("text"
        ,x=as.POSIXct(ymd("1990-01-01")),y=700
        ,label="Year 1990",color="midnightblue") +
  ggtitle("UFO Sighting between 1910-2014 around the World") +
  scale_color_manual(
    values=c("TRUE"="grey","FALSE"="black"),name="",labels=c("TRUE"="Not on Map","FALSE"="On Map")) +
  theme(legend.position="top")

Below we are focusing on the US, analyzing 1990 onward:

Code
## Seasonal Pattern
ufo |> 
  mutate(month_w = isoweek(datetime),year=year(datetime)) |> 
  group_by(month_w,year,country) |> 
  filter(country == "us") |> 
  filter(year >= 1990) |> 
  summarise(n = as.double(n()), .groups="drop") |> 
  ungroup() |> 
  mutate(across(c(month_w,year), as.factor)) |> 
  ggplot(aes(x = month_w,y=year,fill=n)) + 
  geom_bin2d() +
  facet_wrap(~country,scales="free") +
  theme_minimal() +
  scale_fill_viridis_c(option="E",trans="sqrt") +
  # geom_vline(xintercept=c(6,7),linetype="dashed",alpha=c(0.3,1)) +
  coord_polar(start = -pi/53) +
  ggtitle("Count of UFO sighting in US by Week 1990 - 2014") +
  theme(legend.position = "none")
## temporary pattern
ufo |> 
  filter(country == "us") |> 
  mutate(hour = hour(datetime),year=year(datetime)) |> 
  filter(year >= 1990) |>
  group_by(hour,year) |> 
  summarise(n = n()/(365 * 24),.groups="drop") |> 
  ungroup() |> 
  ## ploting
  mutate(across(c(hour,year), as.factor)) |> 
  ggplot(aes(x = hour,y=year,fill=n)) + 
  geom_bin2d() +
  scale_fill_viridis_c(option="E",trans="sqrt") +
  theme_minimal() +
  scale_y_discrete(labels= c() ) +
  coord_polar(start= -pi/24) +
  ## annotate a layer of 
  geom_vline(aes(xintercept=21),color="black",linetype="dashed") +
  ggtitle("Every hour, average sightings of UFO in US, 1990 - 2014") +
  theme(
    legend.position = "none")

The hourly pattern is more clear than yearly pattern. The peak interval for UFO spotting time is 20:00 - 22:00 (the usual time when I decide to pull a glass of wine). There are not much pattern in terms of week other than than the fact that it tend to happen in holidays.

There are systematic ways of analyzing cycles. A spectrum analysis could be extremely helpful here. Or a “periodontal”. I am also curious if incorporating public holidays in the US will explain more temporary variance. This two maybe solved by fit a machine learning model.

Considerably, where the time occurs is also important. The animated map has verified that sighting tend to happen “high population density” cities. Rumor say UFO sighting happens near water because the flying source use water as fuel. Arguably, for archaeology reasons, cities tends to developed around rivers and ports. So it is hard to tell if UFO appears because of water or cities, until we whichever better explains variance in the model.